170

Applications in Computer Vision

Algorithm 13 Training 1-bit detectors via LWS-Det.

Input:

The

training

dataset,

pre-trained

teacher

model.

Output:

1-bit

detec-

tor.

1: Initialize αi and βoi

i ∼N(0, 1) and other real-valued parameters layer-wise;

2: for i = 1 to N do

3:

while Differentiable search do

4:

Compute LAng

i

, LAmp

i

, LW

i

5:

end while

6: end for

7: Compute LGT , LLim

8: for i = N to 1 do

9:

Update parameters via back propagation

10: end for

We introduce the DARTS framework to solve Eq. 6.72, named differential binarization

search (DBS). We follow [151] to efficiently find wi. Specifically, we approximate wi by the

weighted probability of two matrices whose weights are all set as1 and +1, respectively.

We relax the choice of a particular weight by the probability function defined as

pok

i

=



ok∈O

exp(βok

i )



o

k∈O exp(β

o

k

i )

, s.t. O = {w

i , w+

i },

(6.73)

where pok

i

is the probability matrix belonging to the operation ok ∈O. The search space

O is defined as the two possible weights: {w

i , w+

i }. For the inference stage, we select the

weight owning the max probability as

*wi,l = arg max

ok

pok

i,l,

(6.74)

where pok

i,l denotes the probability that the l-th weight of the i-th layer belongs to operation

ok. Therefore, the l -th weight of *w, that is, *wi,l, is defined by the operation having the

highest probability. In this way, we modify Eq. 6.87 by substituting wi to *wi as

LAng

i

=

ai1wi

ai12wi2

ai1*wi

ai12*wi2

2

2.

(6.75)

By this, we retain the top-1 strongest operations (from distinct weights) for each weight

of wi in the discrete set {+1,1}.

6.4.4

Learning the Scale Factor

After searching for wi, we learn the real-valued layers between the i-th and (i+1)-th 1-bit

convolution. We omit the batch normalization (BN) and activation layers for simplicity. We

can directly simplify Eq. 6.69 as

LAmp

i

= Ei(αi; wi, *wi, ai1, ai1).

(6.76)

Following conventional BNNs [77, 287], we employ Eq. 6.80 to further supervise the scale

factor αi. According to [235], we employ a fine-grained limitation of the features to aid in

the prior detection. Hence, the supervision of LWS-Det is formulated as

L = LGT + λLLim + μ

N



i=1

(LAng

i

+ LAmp

i

) + γ

N



i=1

Lw

i ,

(6.77)